Genome-wide association analysis of anthracnose resistance in sorghum [Sorghum bicolor (L.) Moench]

In warm-humid ago-ecologies of the world, sorghum [Sorghum bicolor (L.) Moench] production is severely affected by anthracnose disease caused by Colletotrichum sublineolum Henn. New sources of anthracnose resistance should be identified to introgress novel genes into susceptible varieties in resistance breeding programs. The objective of this study was to determine genome-wide association of Diversity Arrays Technology Sequencing (DArTseq) based single nucleotide polymorphisms (SNP) markers and anthracnose resistance genes in diverse sorghum populations for resistance breeding. Three hundred sixty-six sorghum populations were assessed for anthracnose resistance in three seasons in western Ethiopia using artificial inoculation. Data on anthracnose severity and the relative area under the disease progress curve were computed. Furthermore, the test populations were genotyped using SNP markers with DArTseq protocol. Population structure analysis and genome-wide association mapping were undertaken based on 11,643 SNPs with <10% missing data. The evaluated population was grouped into eight distinct genetic clusters. A total of eight significant (P < 0.001) marker-trait associations (MTAs) were detected, explaining 4.86–15.9% of the phenotypic variation for anthracnose resistance. Out of which the four markers were above the cutoff point. The significant MTAs in the assessed sorghum population are useful for marker-assisted selection (MAS) in anthracnose resistance breeding programs and for gene and quantitative trait loci (QTL) mapping.


Introduction
Sorghum [Sorghum bicolor (L.) Moench, 2n = 2x = 20] is an important cereal crop cultivated globally for multiple uses [1]. It is a mainstay crop in arid and semi-arid agro-ecologies due to its relatively higher drought tolerance compared to other common cereal crops such as maize and wheat. Various constraints affect sorghum production and productivity, notably by biotic stresses such as diseases, weeds (Striga species), and insect pests. Anthracnose, grain mold, leaf marker and anthracnose resistance to identify and introgress novel genes into susceptible varieties by resistance breeding programs to enhance the productivity of the crop. In light of the above background, this study's objective was to determine the genome-wide association of Diversity Arrays Technology Sequencing based single nucleotide polymorphism markers and anthracnose resistance genes in diverse sorghum populations of Ethiopia for resistance breeding.

Plant materials
The study used 366 sorghum collections sourced from different geographical locations in Ethiopia, including the Afar, Amhara, Benshangul Gumuz, Dire Dawa, Gambella, Oromiya, SNNP, Somali, and Tigray regions acquired from Melkassa Agricultural Research Center in Ethiopia. Three improved varieties ('Gemedi', 'Geremew' and 'Btx623') were included as comparative controls. Btx623 is anthracnose susceptible variety obtained from Texas A and M University, USA. The descriptions of the test accessions are presented in Table 1.

Study site and experimental design
The genotypes were evaluated for anthracnose resistance at Bako (9º6' N; 37º9' E) Agricultural Research Center (BARC) in Ethiopia. The centre receives an annual rainfall of 1,600 mm, while the mean maximum and minimum temperatures are 29 ºC and 13 ºC, respectively. The mean monthly relative humidity varies from 46 to 57%, while the main rainy season ranges from May to October, with the most rain received in July and August. Short rains are also received from March to June. The trends of weather data for BARC during the study periods (2016-2018) are summarized in Fig 1. The soil type of the study site is nitosol. The genotypes were established using a 61 x 6 row by column incomplete block design with three replications. Each plot consisted of a single row of 2.1 m long with inter-row and intra-row spacing of 75 cm and 15 cm, respectively.

Pathotyping for anthracnose reaction, data collection and analysis
Based on preliminary evaluations, a single virulent anthracnose spore sourced from the Bako area was aseptically isolated, multiplied, and used to inoculate plants on 45 days after sowing using the procedure of Prom et al. (2019) [24]. The 366 genotypes were rigorously screened for anthracnose resistance at Bako for three seasons (2016-2018). Data on percentage severity of leaf area damaged by the anthracnose were recorded beginning from 15 days after inoculation five times at 10 days' intervals from five randomly tagged plants. The percentage of total leaf area of plants damaged by anthracnose were recorded following [25]. The final disease rating was measured 55 days after inoculation and was referred to as a final anthracnose severity (FAS). FAS was used to distinguish the anthracnose reaction of lines based on the level of severity. Mean severity percentage values for each plot were used for data analysis. Disease severity for anthracnose was used to calculate the area under the disease progress curve (AUDPC). The AUDPC were calculated for each sorghum accession based on [26] as follows: The AUDPC values were converted into the relative area under the disease progress curve (rAUDPC) as a ratio of the actual AUDPC of a sorghum entry to the AUDPC of a susceptible

PLOS ONE
landrace (Acc#239182) across the three cropping seasons (months May-October). The data were checked for normality, homogeneity of variance and validity for analysis of variance following the Bartlet test [27]. Data collected were analyzed using SAS computer software [28].

DNA extraction and DArT sequencing
Due to the poor germination rate, 313 sorghum genotypes were assayed for genomic analysis out of the 366 genotypes. Test genotypes were planted in a greenhouse at Holeta National Agricultural Biotechnology Research Center using seedling trays. Eight leaf disc samples were collected from fresh and young leaf tissue using a 4×96 deep well sample collection plate three weeks after seedling emergence and sent to the Biosciences eastern and central Africa (BecA) hub of the International Livestock Research Institute (BecA-ILRI) in Kenya. Genomic DNA was extracted at BecA-ILRI/Kenya following the plant DNA extraction protocol for DArT [29]. The quality of DNA was checked for nucleic acid concentration and purity using a Nano-Drop 2000 spectrophotometer (ND-2000 V3.5, NanoDrop Technologies, Inc). Samples were genotyped using the DArTseq protocol with 24,634 SNP markers. After eliminating the DArT loci with unknown chromosome positions and filtering markers with more than 10% missing data, a total of 11,643 markers distributed across the 10 chromosomes were maintained for analysis. The markers used had SNPs with call rate > 97%, and allele-calling equal or greater than 98% were selected. Genotypes with read depth less than the threshold were coded as missing. SNP markers have high rates of genotype missingness (>10%) and rare SNPs with <5% minor allele frequency (MAF) were discarded.

Marker-trait data analysis
Population structure. Principal component analysis (PCA) was computed in the R package ggplot2 [30]. Model-based maximum likelihood analysis of population structure was calculated using the ADMIXTURE program, a high-performance tool for estimating ancestry in unrelated individuals [31]. To discern the optimum number of ancestral populations, ADMIX-TURE was run with a 10-fold cross-validation procedure for K values varying from 2 to 20. The K value with the lowest standard error was selected. The graphical representation of the admixture patterns was depicted using the R package pophelper [32]. A membership coefficient greater than 0.70 discerned if a genotype is belonged to the group, as stated by Ketema et al. (2020) [33]. Admixed genotypes had a membership coefficient of less than 0.70 at each assigned K.

Marker-trait association
Data on anthracnose severity and GBS based on 11,643 robust SNP markers were used for GWAS analysis. The FarmCPU Model [34] was used to perform an association analysis using the R software's GAPIT package [16]. The model splits the mixed linear model into a fixed effect and a random effect model to minimize false negatives caused by confounding population structure and SNPs. A coancestry matrix from ADMIXTURE was included as a covariate in GAPIT to reduce spurious associations.
The GWAS results were visually examined using the Manhattan and quantile-quantile (QQ) plots. The plots were generated using the CMPlot R package [35]. The cutoff of significant association was a False Discovery Rate (FDR) adjusted p-value less than 0.1, which was computed using the Benjamini and Hochberg procedure to control for multiple testing [36] and an exploratory significance cutoff p <0.001 was also used. SNPs with a high probability of contribution to essential traits were tracked to the specific chromosome location based on the sorghum reference genome sequence, version 3.1.1 available at the Phytozome v12 [37].

Identification of candidate genes
Candidate genes were identified based on the significant SNP markers associated with the traits. This was achieved using the physical genome assembly of sorghum reference genome sequence, version 3.1.1 (https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org Sbicolor) serving as identifying candidate genes between 20 kb on either side of significant SNPs. The putative functional candidate genes that co-localized with associated SNPs were annotated based on similarity to known annotated genes in other species such as Arabidopsis and tomato.

Anthracnose resistance
Combined analysis of variance indicated significant differences (P < 0.01) due to the effects of genotype, season and genotype by season interaction for FAS and rAUDPC. These suggested the presence of considerable genetic variability among the test populations to select anthracnose resistant lines and aiding marker-trait analysis ( Table 2).
There were 32 genotypes that scored lower rAUDPC values at the final stage of severity rating that ranged between 29.4 (entry 74685) to 40.9 (223562). These lines were identified as moderately resistant, making them ideal selections for anthracnose resistance breeding. Conversely, 334 genotypes were susceptible and expressed higher rAUDPC values ranging between 43.6 and 91.0. The rAUDPC showing the progression of disease severity on selected resistant and susceptible test genotypes during severity assessment intervals is presented in Fig 2. Four genotypes (239182, 72564, 73048 and 238428) scored the highest rAUDPC values compared with the anthracnose susceptible check (Btx623). The mean values for final severity and rAUDPC are presented in S1 Table. Marker characterization Population structure. Population structure analysis resolved eight sub-populations based on SNPs profiles. The Admixture algorithm was used to determine the population structure of the 313 sorghum genotypes. The ADMIXTURE analysis was performed for different K values varying from 2 to 20. Cross-validation error estimates for the ADMIXTURE models steeply

PLOS ONE
decreased from K = 2 to K = 8, and increased steadily to a higher level at K = 20 (Fig 3). The optimal number of sub-populations were at K = 8 (Fig 4). Individual ancestry was estimated for different K-values. Each cluster consisted of a genetically diverse and variable number of entries. About 65% of the assessed genotypes (203

PLOS ONE
accessions) were allocated in one population with a high ancestry membership coefficient with a likelihood of more than 0.60. The remaining 35% of test populations represented accessions with high admixture, which is expected from landrace populations.
A scree plot was generated to visualize the number of principal components. Overall, 10 principal components were identified of which principal components 1 (PC1), PC2 and PC3 explained relatively the highest proportion to the total variance ( Fig 5A). Principal component analysis (PCA) using PC1 and PC2 stratified the test populations based on areas of collection ( Fig 5B). The allocation of test genotypes were irrespective to the origin of collections.

Marker-trait association
A summary of the genome-wide association analysis of anthracnose resistance and SNP markers amongst 313 sorghum collections is presented in Table 4. Eight significant (P < 0.001) marker-trait associations were resolved on chromosomes 1, 4, 6, 8, 9 and 10 ( Table 4 and Fig  6A). The quantile-quantile plots (Fig 6B) of p-values were examined to determine how well the models accounted for population structure and familial relatedness.
Three markers, including rs1887698 and rs2681689 located on chromosome 9 and rs1884746 on chromosome 10 had a significant (P < 0.001) association with anthracnose resistance. Markers rs1887698 and rs2681689 are significantly associated with chromosome 9 with P = 4.4 x 10 −06 and 4.99 x 10 −06 , respectively, explaining 15.9% of the total variation for anthracnose resistance. Whereas marker rs1884746 located on chromosome 10 (P = 4.28 x 10 −05 ) explained 15.0% of the total variation. Furthermore, SNPs significantly associated with anthracnose resistance were detected on chromosomes 1, 8, 9, 6 and 4, which explained 4.86 to 5.23% of the total variation among the tested sorghum collections for anthracnose resistance. Overall, eight SNPs were found to affect anthracnose resistance (Table 4), of which four SNPs were above the cutoff point ( Fig 6A). Hence, the significant MTAs localized on the specific chromosomes influence anthracnose resistance and could be exploited in gene introgression and selection. Also, the proportion of the phenotypic variation (R 2 > 4) observed for all significant markers suggests their possible influence on anthracnose resistance.

Putative genes that condition anthracnose resistance
A blast search of potential anthracnose resistance genes identified three candidate genes. The two candidate genes were located on chromosome 9, while one gene on chromosome 10. The two genes on chromosome 9 are annotated with protein-coding genes, including Sobic.009G008800 (xylem cysteine proteases) and Sobic.009G126300 (Threonine-specific protein Table 3. The proportion of the membership of each predefined population in each of the clusters obtained at optimum K (K = 8). Afar  8  25  0  13  13  36  0  13  0  0   Amhara  44  31  11  5  13  20  4  4  9  3   Benishangul  2  11  22  11  11  12  0  11  kinase) linked to markers rs2681689 and rs1887698, respectively. In addition, one gene was annotated with protein-coding gene Sobic.010G012200 (Gluconokinase/Gluconate kinase) and linked to marker rs1884746 located on chromosome 10. The SNP data set used in the study is presented in S2 Table.

Discussion
The existence of significant genotype by season interaction effects on anthracnose resistance suggests considerable genetic diversity among the assessed genotypes, and genotype performance varies across seasons. These results support earlier studies that there is potentially adequate genetic variation among Ethiopian sorghum to select for anthracnose resistance and to support genetic analysis, gene discovery and breeding efforts [8,9,11,38]. Genetic management of sorghum anthracnose requires unique sources of resistance for breeding programs. To identify new sources of anthracnose resistance genes, this study determined the population structure and association of genomic regions with anthracnose resistance. The studied sorghum population represents a diverse population of sorghum collections from Ethiopia, it's center of origin and diversity. Sorghum anthracnose resistance present in the test population was discerned through rigorous field screening in three seasons. Previous studies assessed the genetic basis of anthracnose resistance in sorghum [7]. The authors reported three loci on chromosome 5 amongst 335 US accessions using GWAS. In addition, Prom et al. (2019) [10] used 359 sorghum populations and identified quantitative trait loci on chromosome 8 conditioning resistance to anthracnose.
The population structure generated using hierarchical clustering, admixture, and principal component analysis identified a clear differentiation of the assessed sorghum collections (Figs  4 and 5). The test population was grouped into eight distinct genetic groups (Fig 4).
Previous studies of population structure identified 11 genetic groups using 1,425 Ethiopian sorghum accessions [12], while 374 accessions from Ethiopia were separated into 11 populations [39]. In addition, a total of 318 Sudanese sorghum core collections were evaluated with 183,144 SNP markers using the model-based clustering method that portrayed five subpopulations [11]. Furthermore, 940 diverse sorghum landraces of Ethiopia were assessed using 54,080 SNP markers that identified 12 subpopulations [40].
Markers associated with anthracnose resistance were identified on chromosomes 9 and 10, suggesting the value of these genomic regions in gene introgression and pyramiding programs. Previously, markers for anthracnose resistance in sorghum collections were reported in the US Table 4. Summary on genome-wide association analysis of anthracnose resistance and SNP markers amongst 313 sorghum collections indicating significant markers, alleles detected with chromosome number and position, significant value, coefficient of determination (R 2 ) and annotation. sorghum collections. These markers were identified on chromosome 5 [7,9] and chromosome 9 [8] using SNP markers. The new markers are potentially useful for marker-assisted breeding of anthracnose resistance in the assessed sorghum populations [41]. The current study identified eight significant (P < 0.001) marker-trait associations on chromosomes 1, 4, 6, 8, 9 and 10 ( Table 4 and Fig 6A). The quantile-quantile plots ( Fig 6B) showed a perfect fit of the observed and expected population structures for anthracnose resistance using the FarmCPU Model. The novel genes are additions to previously identified genomic regions conditioning resistance to anthracnose, downy mildew, and head smut in sorghum [4,[7][8][9]. Four SNP markers above the cutoff point (Fig 6A) are novel anthracnose resistance markers detected in this study. The markers are located on chromosomes 9, 10 and 4. Previously [4] used 242 mini core accessions and three sorghum cultivars and were subjected to genome-wide association analysis. The authors reported the anthracnose resistance gene on chromosome 8. In addition [9], examined 114 recombinant inbred lines (RILs) obtained from crosses of sorghum anthracnose resistant line SC112-14 (PI533918) and susceptible line PI609251 and identified tightly linked anthracnose resistance locus on chromosome 5. Through blast analysis, the present study identified three candidate anthracnose resistance genes. The two genes are located on chromosome 9, while one gene is on chromosome 10. The two genes located on chromosome 9 are annotated with protein-coding genes Sobic.009G008800 and Sobic.009G126300 linked to markers rs1887698 and rs2681689, respectively. The Sobic.009G008800 reportedly had a strong association with annotated function xylem cysteine peptidase 1 of sorghum conditioning disease resistance. The biological effect of the gene includes programmed cell death, responsive to dehydration, while the annotated sorghum genes had roles in anthracnose resistance on chromosome 9 [42,43]. Pogány et al. (2015) [42] reported that cysteine proteases is one of the first Arabidopsis proteases functioning in the immune system of sorghum. In a mutant strain of Arabidopsis, overexpression of the aspartic protease Constitutive Disease Resistance 1 (CDR1) resulted in resistance to the virulent strain of Pseudomonas syringae, a bacterium that causes diseases of monocots, herbaceous dicots, and woody dicots. Cysteine proteases also contain a cysteine nucleophilic residue that performs a nucleophilic attack proteolysis resulting in an intermediate state where the enzyme is covalently attached to its substrate [44]. Cysteine proteases reportedly conferred immunity in tomatoes against diseases caused by Phytophthora infestans (Pinf) and Cladosporium fulvum (syn. Passalora fulva) [45,46].

SNP ID
In the present study, a second significant candidate gene, Sobic.009G126300 was located on chromosome 9 linked to marker rs1887698 with annotated function similar to Serine/threonine-protein kinase CTR1. Serine/threonine-protein kinase is reportedly known for mediating drought stress tolerance in plants. Furthermore, the threonine-protein kinase is believed to have a role in regulating ethylene hormone pathways conditioning drought response in senna (Cassia angustifolia Vahl.) [47]. The 3 rd candidate gene identified in this study is linked to the marker rs1884746 annotated with Sobic.010G012200 (Gluconokinase/Gluconate kinase). Gluconate kinase is an essential enzyme of the oxidative pentose phosphate pathway responsible for NADPH supplies during fatty acid synthesis in developing embryos of Thermotoga maritima [48,49].
The anthracnose resistance candidate genes identified in the present study will be validated across multiple seasons and locations as ideal molecular markers for anthracnose resistance breeding programs. The novel genes could be stacked into anthracnose susceptible sorghum lines through marker-assisted or recurrent selection method. A combination of novel QTLs would render durable resistance to sorghum anthracnose.

Conclusion
Marker trait association is key to identifying genomic regions associated with anthracnose resistance for marker-assisted breeding in sorghum improvement programs. The present study identified four novel markers associated with anthracnose resistance designated as rs1887698, rs2681689, rs1884746 and rs100028710. The new genetic markers identified in the current sorghum populations are valuable genomic resources for future parental selection, quantitative trait loci analysis, trait introgression, gene pyramiding and marker-assisted selection of anthracnose resistance in sorghum breeding programs in Ethiopia or related agro-ecologies. Further studies are required to validate the significant markers identified in the present study.
Supporting information S1